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Description 

STORAGE VISUALIZATION COMPUTER 
SYSTEM AND EXTERNAL CONTROLLER 

THEREFOR 

Background of Invention 
[0001] i. Field of the Invention 

[0002] The invention relates to a storage virtualization computer 
system. More particularly, a storage virtualization com- 
puter system that uses point-to-point serial-signal inter- 
connect as the primary device-side 10 device interconnect 
is disclosed. 

[0003] 2. Description of the Prior Art 

[0004] storage virtualization is a technology that has been used 
to virtualize physical storage by combining sections of 
physical storage devices (PSDs) into logical storage enti- 
ties, herein referred to as logical media units (LMUs), that 
are made accessible to a host system. This technology has 
been used primarily in redundant arrays of independent 



disks (RAID) storage virtualization, which combines 
smaller physical storage devices into larger, fault tolerant, 
higher performance logical media units via RAID technol- 
ogy. 

[0005] a Storage virtualization Controller, abbreviated SVC, is a 
device the primary purpose of which is to map combina- 
tions of sections of physical storage media to logical me- 
dia units visible to a host system. 10 requests received 
from the host system are parsed and interpreted and as- 
sociated operations and data are translated into physical 
storage device 10 requests. This process may be indirect 
with operations cached, delayed (e.g., write-back), antici- 
pated (read-ahead), grouped, etc. to improve performance 
and other operational characteristics so that a host 10 re- 
quest may not necessarily result directly in physical stor- 
age device 10 requests in a one-to-one fashion. 

[0006] An External (sometimes referred to as "Stand-alone") 

Storage Virtualization Controller is a Storage Virtualization 
Controller that connects to the host system via an 10 in- 
terface and that is capable of supporting connection to 
devices that reside external to the host system and, in 
general, operates independently of the host. 

[0007] one example of an external Storage Virtualization Con- 



troller is an external, or stand-alone, direct-access RAID 
controller. A RAID controller combines sections on one or 
multiple physical direct access storage devices (DASDs), 
the combination of which is determined by the nature of a 
particular RAID level, to form logical media units that are 
contiguously addressable by a host system to which the 
logical media unit is made available. A single RAID con- 
troller will typically support multiple RAID levels so that 
different logical media units may consist of sections of 
DASDs combined in different ways by virtue of the differ- 
ent RAID levels that characterize the different units. 
[0008] Another example of an external Storage Virtualization 
Controller is a JBOD emulation controller. A JBOD, short 
for "Just a Bunch of Drives", is a set of physical DASDs that 
connect directly to a host system via one or more a multi- 
ple-device 10 device interconnect channels. DASDs that 
implement point-to-point 10 device interconnects to con- 
nect to the host system (e.g., Parallel ATA HDDs, Serial 
ATA HDDs, etc.) cannot be directly combined to form a 
"JBOD" system as defined above for they do not allow the 
connection of multiple devices directly to the 10 device 
channel. An intelligent "JBOD emulation" device can be 
used to emulate multiple multiple-device 10 device inter- 



connect DASDs by mapping 10 requests to physical DASDs 
that connect to the JBOD emulation device individually via 
the point-to-point 10-device interconnection channels. 
[0009] Another example of an external Storage Virtualization 

Controller is a controller for an external tape backup sub- 
system. 

[0010] The primary function of a storage virtualization controller, 
abbreviated as SVC, is to manage, combine, and manipu- 
late physical storage devices in such a way as to present 
them as a set of logical media units to the host. Each LMU 
is presented to the host as if it were a directly-connected 
physical storage device (PSD) of which the LMU is sup- 
posed to be the logical equivalent. In order to accomplish 
this, IO requests sent out by the host to be processed by 
the SVC that will normally generate certain behavior in an 
equivalent PSD also generate logically equivalent behavior 
on the part of the SVC in relation to the addressed logical 
media unit. The result is that the host "thinks" it is directly 
connected to and communicating with a PSD when in ac- 
tuality the host is connected to a SVC that is simply emu- 
lating the behavior of the PSD of which the addressed log- 
ical media unit is the logical equivalent. 

[0011] | n order to achieve this behavioral emulation, the SVC 



maps 10 requests received from the host to logically 
equivalent internal operations. Some of these operations 
can be completed without the need to directly generate 
any device-side 10 requests to device-side PSDs. Among 
these are operations that are processed internally only, 
without ever the need to access the device-side PSDs. The 
operations that are initiated as a result of such 10 re- 
quests will herein be termed "internally-emulated opera- 
tions". 

[0012] There are operations that cannot be performed simply 

through internal emulation and yet may not directly result 
in device-side PSD accesses. Examples of such include 
cached operations, such as data read operations in which 
valid data corresponding to the media section addressed 
by the IO request currently happens to reside entirely in 
the SVC's data cache, or data write operations when the 
SVC's cache is operating in write-back mode so that data 
is written into the cache only at first, to be committed to 
the appropriate PSDs at a future time. Such operations will 
be referred to as "asynchronous device operations" 
(meaning that any actual IO requests to device-side PSDs 
that must transpire in order for the requested operation to 
achieve its intended goal are indirectly performed either 



prior or subsequent to the operation rather than directly 
in response to the operation). 

[0013] yet another class of operations consists of those that di- 
rectly generate device-side IO requests to PSDs in order to 
complete. Such operations will be referred to as "syn- 
chronous device operations". 

[0014] Some host-side IO requests may map an operation that 
may consist of multiple sub-operations of different 
classes, including internally-emulated, asynchronous de- 
vice and/or synchronous device operations. An example 
of a host-side IO request that maps to a combination of 
asynchronous and synchronous device operations is a 
data read request that addresses a section of media in the 
logical media unit part of whose corresponding data cur- 
rently resides in cache and part of whose data does not 
reside in cache and therefore must be read from the PSDs. 
The sub-operation that takes data from the cache is an 
asynchronous one because the sub-operation does not 
directly require device-side PSD accesses to complete, 
however, does indirectly rely on results of previously-exe- 
cuted device-side PSD accesses. The sub-operation that 
reads data from the PSDs is a synchronous one, for it re- 
quires direct and immediate device-side PSD accesses in 



order to complete. 

[0015] Traditionally storage virtualization has been done with 

Parallel SCSI, Fibre, or Parallel ATA IO device interconnects 
as the primary device-side IO device interconnects con- 
necting physical storage devices to the storage virtualiza- 
tion controller. Both Parallel SCSI and Fibre are multiple-de- 
vice IO device interconnects. Multiple-device IO device in- 
terconnects share bandwidth among all hosts and all de- 
vices interconnected by the interconnects. 

[0016] please refer to Fig.l, which is a block diagram of a con- 
ventional storage virtualization computer system using 
Parallel SCSI as the primary device-side IO device inter- 
connect. With the Parallel SCSIdevice-side IO device inter- 
connect, the total bandwidth is limited to 320 MB/s per 
interconnect, or 1280 MB/s accumulated bandwidth for an 
implementation like the one diagramed in Fig. 1 with 4 
Parallel SCSI device-side interconnects. Please refer to 
Fig. 2, Fig. 2 is a block diagram of a conventional storage 
virtualization computer system using Fibre FC-AL as the 
primary device-side IO device interconnect. With the Fibre 
FC-AL device-side IO device interconnect, the total band- 
width is limited to 200 MB/s per interconnect, or 800 MB/ 
s accumulated bandwidth for an implementation like the 



one diagramed in Fig. 2 with 4 Fibre device-side intercon- 
nects. 

[0017] Multiple-device IO device interconnects, for example Par- 
allel SCSI, suffer from the shortcoming that a single failing 
device connected on the interconnect could interfere with 
communications and/or data transfer between hosts and 
other devices sharing the interconnect. Fibre FC-AL im- 
plementations have alleviated this concern to a certain 
extent by providing dual redundant interconnects that 
provide two paths to each device should one path break 
or become blocked for some reason (e.g., interference 
from another failing device). However, this is still inferior 
to a dedicated interconnect per storage device, for inde- 
pendent failures on both interconnects could still result in 
disabling interference concurrently on both intercon- 
nects. Dedicated interconnects, on the other hand, insure 
full signal integrity independence between interconnects 
so that a failure on one will not affect another. 

[0018] Another traditional storage virtualization has been done 
with Parallel ATA device-side IO device interconnects, 
which is a point-to-point IO device interconnect using 
parallel signal transmission. By using point-to-point inter- 
connects with each physical storage device having its own 



dedicated interconnect connecting it to the hosts, each 
particular physical storage device is afforded guaranteed, 
dedicated bandwidth such that N physical storage devices 
can achieve N times the bandwidth of a single intercon- 
nect. 

[0019] Parallel ATA, however, suffers from the drawback that de- 
vice-side IO device interconnects at most, only protect the 
payload data portion of information and not the control 
information (e.g., block address, data length, etc). In ad- 
dition, Parallel ATA interconnects do not scale well beyond 
a certain point because of the number of dedicated signal 
lines (28) that must be used to form each distinct inter- 
connect.Moreover, P-ATA, because of its parallel nature, 

will not be able to easily support higher interface speeds. 
Summary of Invention 

[0020] it j S therefore a primary objective of the claimed invention 
to provide a storage virtualization computer system using 
point-to-point serial-signal transmissions as the primary 
device-side IO device interconnects to solve the above- 
mentioned problems. 

[0021] According to the claimed invention, a storage virtualiza- 
tion computer system is introduced. The storage virtual- 
ization computer system comprises a host entity for issu- 



ing an 10 request, an external storage virtualization con- 
troller coupled to the host entity for executing 10 opera- 
tions in response to the 10 request, and at least one phys- 
ical storage device each coupled to the storage virtualiza- 
tion controller through a point-to-point serial-signal in- 
terconnect for providing storage to the storage virtualiza- 
tion computer system through the storage virtualization 
controller. In one embodiment, the point-to-point serial- 
signal interconnect is a Serial ATA 10 device interconnect. 

[0022] | t j S an advantage of the claimed invention that in the 

storage virtualization computer system using Serial ATA 
as the primary device-side 10 device interconnect, each 
physical storage device has a dedicated interconnect to 
the storage virtualization controller. 

[0023] it is another advantage of the claimed invention that not 
only the payload data portion of information but also the 
control information are protected by the S-ATA 10 device 
interconnect. 

[0024] These and other objectives of the present invention will no 
doubt become obvious to those of ordinary skill in the art 
after reading the following detailed description of the pre- 
ferred embodiment that is illustrated in the various fig- 
ures and drawings. 



Brief Description of Drawings 

[0025] pig.l is a block diagram of a conventional storage visual- 
ization computer system using Parallel SCSI as the primary 
device-side 10 device interconnect. 

[0026] pig. 2 is a block diagram of a conventional storage visual- 
ization computer system using Fibre FC-AL as the primary 
device-side 10 device interconnect. 

[0027] pig. 3 is a block diagram of a storage virtualization com- 
puter system according to the present invention. 

[0028] Fig. 4 is an embodiment block diagram of the SVC and the 
connection thereof to the host entity and the PSD array in 
Fig. 3. 

[0029] Fig. 5 is an embodiment block diagram of the CPC in Fig. 4. 

[0030] Fig. 6 is an embodiment block diagram of the CPU chipset/ 

parity engine in Fig. 5. 
[0031] Fig. 7 is a block diagram of the SATA device interconnect 

controller of Fig. 4. 
[0032] Fig. 8 is a block diagram of the PCI-X to ATA controller of 

Fig. 7. 

[0033] Fig. 9 is a block diagram of the SATA port of Fig. 8. 

[0034] Fig. 10 illustrates the transmission structure complying 
with serial ATA protocol. 



[0035] Fig. 11 illustrates a first FIS data structure complying with 
serial ATA protocol. 

[0036] Fig. 12 illustrates a second FIS data structure complying 
with serial ATA protocol. 

[0037] Fig. 13 and Fig. 14 are examples of the 10 flow between the 
host entity and the SVC of Fig. 3. 

[0038] Fig. 15 and Fig. 16 are examples of the 10 flow between the 
SVC and a PSD of Fig. 3. 

[0039] Fig. 17 is a block diagram of a storage virtualization sub- 
system supporting device-side expansion ports. 

[0040] Fig. 18 is a block diagram of a storage virtualization sub- 
system supporting device-side expansion ports. 

[0041] Fig. 19 is a block diagram of a removable P-ATA-PSD can- 
ister. 

[0042] Fig. 20 is a block diagram of a removable S-ATA-PSD can- 
ister. 

Detailed Description 

[0043] please refer to Fig. 3, which is an embodiment block dia- 
gram of a storage virtualization computer system accord- 
ing to the present invention. In Fig. 3 Serial ATA is used as 
the primary device-side 10 device interconnects. The 
computer system comprises a host entity 10 and a con- 
nected storage virtualization subsystem (SVS) 20. Al- 



though there is illustrated in Fig. 3 only one host entity 10 
connected with one SVS 20, there can be more than one 
SVS 20 attached to the host entity 10 or more than one 
host entity 10 can be attached to the SVS 20 or both. 

[0044] The host entity 10 can be a host computer, such as a 

server system, a workstation, a PC system, or the like. The 
SVS 20 comprises a storage virtualization controller 200, 
which can be a RAID controller or aJBOD emulator, and a 
physical storage device array (PSD array) 400 connected 
by Serial ATA interconnect 201. Although only one PSD 
array 400 is illustrated here, more then one PSD array 400 
can be attached to the SVC 200. Also, the host entity 10 
can be another SVC. 

[0045] The SVC 200 receives 10 requests and related data (the 
control signals and data signals) from the host entity 10 
and executes the 10 requests internally or maps the 10 re- 
quests to the PSD array 400. The PSD array 400 comprises 
a plurality of physical storage devices 420, which can be 
hard disk drives (HDD), for example. The SVC 200 can be 
used to enhance performance and/or improve data avail- 
ability and/or increase storage capacity of a single logical 
media unit in view of the host entity 10. 

[0046] when a logical media unit in the SVS 20 is set to use a 



RAID level other than levels 0 or 1, for example, levels 3 
through 5, the PSD array 400 comprises at least one parity 
PSD, that is, a PSD 420 which comprises parity data 
therein, and data availability can thus be improved. In ad- 
dition, the performance can be improved in execution of 
an 10 operation, since the accessed data is distributed 
among more than one PSD 420. Moreover, since the logi- 
cal media unit is a combination of sections of a plurality 
of PSDs 420, the accessible storage capacity in a single 
logical media unit can be largely increased. For example, 
in a RAID subsystem of RAID level 5, the functionality de- 
scribed above can all be achieved. 

[0047] when a logical media unit in the SVS 20 is set to use a 
RAID level 1, the same data will be stored in two PSDs 
420, and thus data availability can be greatly enhanced at 
the cost of doubling the PSD 420 cost. 

[0048] when a logical media unit in the SVS 20 is set to use a 

RAID level 0, performance improvement prevails over the 
availability concerns and thus no enhancement of data 
availability is provided. Performance, however, can be 
greatly improved. For example, a RAID subsystem of RAID 
level 0 having 2 hard disk drives can have, theoretically, a 
performance of 200% of a storage device having only one 



hard disk drive, since different data sections can be stored 
into the two separate hard disk drives at the same time 
under the control of the SVC 200. 

[0049] please refer to Fig. 4, which is an embodiment block dia- 
gram of the SVC 200 and the connection thereof to the 
host entity 10 and the PSD array 400. In this embodiment, 
the SVC 200 comprises a host-side 10 device interconnect 
controller 220, a central processing circuit (abbreviated 
CPC) 240, a memory 280, and a Serial ATA (S-ATA) 10 de- 
vice interconnect controller 300. Although illustrated in 
separate functional blocks, two or more or even all of 
these functional blocks can be incorporated into to one 
chip in practical implementation. 

[0050] The host-side 10 device interconnect controller 220 is 

connected to the host entity 10 and the CPC 240 to serve 
as an interface and buffer between the SVC 200 and the 
host entity 10, and receives 10 requests and related data 
from the host entity 10 and map and/or transfer them to 
the CPC 240. The host-side 10 device interconnect con- 
troller220 comprises one or more host-side ports for 
coupling to the host entity 10. Some common port types 
that might be incorporated here are: Fibre Channel sup- 
porting Fabric, point-to-point, public loop and/or private 



loop connectivity in target mode, parallel SCSI operating in 
target mode, ethernet supporting the iSCSI protocol oper- 
ating in target mode, Serial-Attached SCSI (SAS) operating 
in target mode, and Serial ATA operating in target mode. 

[0051] when the CPC 240 receives the IO requests of the host 
entity 10 from the host-side 10 device interconnect con- 
troller 220, the CPC 240 parses it and performs some op- 
erations in response to the IO requests and sends the data 
requested and/or reports and/or information from the 
SVC 200 back to the host entity 10 through the host-side 
IO device interconnect controller 220. 

[0052] After parsing an IO request received from the host entity 
10, when a read request is received and one or more op- 
erations are performed in response, the CPC 240 gets the 
requested data either internally or from the memory 280 
or in both ways and transfers them to the host entity 10. 
If the data is not available either internally or does not ex- 
ist in the memory 280, the IO request will be issued to the 
PSD array 400 through the SATA IO device interconnect 
controller 300 and the requested data will be transferred 
from the PSD array 400 to the memory 280 and then 
passed to the host entity 10 through host-side IO device 
interconnect controller 220. 



[0053] when a write request is received from the host entity 10, 
after parsing the request and performing one or more op- 
erations, the CPC 240 gets the data from the host entity 
10 through the host-side 10 device interconnect controller 
220, stores them in the memory 280 and then, for asyn- 
chronous or synchronous device operations, transmits the 
data to the PSD array 400 through the CPC 240. When the 
write request is a write back request, the 10 complete re- 
port can be issued to the host entity 10 first and then the 
CPC 240 performs the actual write operation later. When 
the write request is a write through request, the 10 com- 
plete report is issued to the host entity 10 after the re- 
quested data is actually written into the PSD array 400. 

[0054] The memory 280 is connected to the CPC 240 and acts as 
a buffer to buffer the data transferred between the host 
entity 10 and the PSD array 400 through the CPC 240. In 
one embodiment, the memory 280 can be a DRAM; or 
more particularly, the DRAM can be a SDRAM. 

[0055] The SATA 10 device interconnect controller 300 is the de- 
vice-side 10 device interconnect controller connected be- 
tween the CPC 240 and the PSD array 400. It serves as an 
interface and buffer between the SVC 200 and the PSD ar- 
ray 400 and receives 10 requests and related data issued 



from CPC 240 and maps and/or transfers them to the PSD 
array 400. The SATA 10 device interconnect controller 300 
re-formats the data and control signals received from CPC 
240 to comply with S-ATA protocol and transmits them to 
the PSD array 400. 

[0056] | n this embodiment, an enclosure management service 

(EMS) circuitry 360 is attached to the CPC 240 for manag- 
ing and monitoring at least one of the following devices 
belonging to the storage virtualization subsystem 20: 
power supplies, fans, temperature sensors, voltages, un- 
interruptible power supplies, batteries, LEDs, audible 
alarms, PSD canister locks, door locks. However, in an- 
other arrangement of the SVS 20, the EMS circuitry 360 
can be omitted, depending on the actual requirements of 
the various product functionality. Alternatively, the func- 
tion of the EMS circuitry 360 can be incorporated into the 
CPC 240. Aspects of the EMS will be discussed later. 

[0057] | n Fig. 5, an embodiment of the CPC 240 is shown, com- 
prising the CPU chipset/parity engine 244, the CPU 242, a 
ROM (Read Only Memory) 246, a NVRAM (Non-volatile 
RAM) 248, an LCD module 350 and an enclosure manage- 
ment service circuitry EMS 360. The CPU can be, e. g., a 
Power PC CPU. The ROM 246 can be a FLASH memory for 



storing BIOS and/or other programs. The NVRAM is pro- 
vided for saving some information regarding the 10 oper- 
ation execution status of the array which can be examined 
after an abnormal power shut-off occurs in the event that 
10 operation execution does not complete. LCD module 
350 shows the operation of the subsystem LCDs. EMS 360 
can control the power of the DASD array and do some 
other management. The ROM 246, the NVRAM 248, the 
LCD module 350 and the enclosure management service 
circuitry EMS 360 are connected to the CPU chipset/parity 
engine 244 through an X-bus. 

[0058] pig. 6 is a block diagram illustrating an embodiment of the 
CPU chipset/parity engine 244 according to the present 
invention. In the present embodiment, the CPU chipset/ 
parity engine 244 mainly comprises parity engine 260, 
CPU interface 910, memory interface 920, PCI interfaces 
930, 932, X-BUS interface 940, and PM BUS 950. The PM 
BUS 950 is, for example, a 64-bit, 133Mhz bus and con- 
nects the parity engine 260, CPU interface 910, memory 
interface 920, PCI interfaces 930, 932, X-BUS interface 
940 altogether for communicating data signal and control 
signal among them. 

[0059] D a ta and control signals from host-side 10 device inter- 



connect controller 220 enter CPU chip/parity engine 244 
through PCI interface 930 and are buffered in PM FIFO 
934. The PCI interface 930 to the host-side 10 device in- 
terconnect controller 220 can be, for example, of a band- 
width of 64-bit, 66Mhz. When in the PCI slave cycle, the 
PCI interface 930 owns the PM bus 950 and the data and 
control signals in the PM FIFO 934 are then transmitted to 
either the memory interface 920 or to the CPU interface 
910. 

[0060] The data and control signals received by the CPU interface 
910 from PM bus 950 are transmitted to CPU 242 for fur- 
ther treatment. The communication between the CPU in- 
terface 910 and the CPU 242 can be performed, for exam- 
ple, through a 64 bit data line and a 32 bit address line. 
The data and control signals can be transmitted to the 
memory interface 920 through a CM FIFO 922 of a band- 
width of 64 bit, 133 MHz. 

[0061] An ECC (Error Correction Code) circuit 924 is also pro- 
vided and connected between the CM FIFO 922 and the 
memory interface 920 to generate ECC code. The ECC 
code can be generated, for example, by XORing 8 bits of 
data for a bit of ECC code. The memory interface 920 then 
stores the data and ECC code to the memory 280, for ex- 



ample, an SDRAM. The data in the memory 280 is trans- 
mitted to PM bus 950 through the ECC correction circuit 
926 and compared with the ECC code from the ECC circuit 
924. The ECC correction circuit 926 has the functionality 
of one-bit auto-correcting and multi-bit error detecting. 

[0062] The parity engine 260 can perform parity functionality of 
a certain RAID level in response to the instruction of the 
CPU 242. Of course, the parity engine 260 can be shut off 
and perform no parity functionality at all in some situa- 
tion, for example, in a RAID level 0 case. In one embodi- 
ment as shown in the Fig. 6, the parity engine 260 can in- 
clude an XOR engine 262 to connect with the PM bus 950 
through XOR FIFO 264. The XOR engine 262 can perform, 
for example, the XOR function for a memory location with 
given address and length of the location. 

[0063] The PLL (Phase Locked Loop) 980 is provided for main- 
taining desirable phase shifts between related signals. The 
timer controller 982 is provided as a timing base for vari- 
ous clocks and signals. The internal registers 984 are pro- 
vided to register status of CPU chip/parity engine 244 and 
for controlling the traffic on the PM bus 950. In addition, a 
pair of UART functionality blocks 986 are provided so that 
CPU chip/parity engine 244 can communicate with outside 



through RS232 interface. 

[0064] | n an alternative embodiment, PCI-X interfaces can be 

used in place of the PCI interfaces 930, 932. Those skilled 
in the art will know such replacement can be easily ac- 
complished without any difficulty. In an alternative em- 
bodiment, PCI Express interfaces can be used in place of 
the PCI interfaces 930, 932. Those skilled in the art will 
know such replacement can be easily accomplished with- 
out any difficulty. 

[0065] please refer to Fig. 7, which is an embodiment block dia- 
gram of the SATA 10 device interconnect controller 300 of 
Fig. 4. According to the present embodiment, the SATA 10 
device interconnect controller 300 comprises two PCI-X to 
SATA controllers 310. Fig. 8 is an embodiment block dia- 
gram of the PCI-X to S-ATA controller 310 of Fig. 7. As 
shown in Fig. 8, each PCI-X to SATA controller 310 com- 
prises a PCI-X Interface 312 connected to the CPC 240, a 
Dec/Mux Arbiter 314 connected to the PCI-X interface 
312, and 8 SATA Ports 600 connected to the Dec/Mux Ar- 
biter 314. The PCI-X interface 312 comprises a bus inter- 
face 318 connecting to the Dec/Mux arbiter 314 and a 
configuration circuitry 316 storing the configuration of 
the PCI-X to SATA controller 310. The Dec/Mux arbiter 



314 performs arbitration between the PCI-X interface 312 
and the plurality of SATA ports 600 and address decoding 
of the transactions from the PCI-X interface 312 to the 
SATA ports 600. Through an SATA port 600 of the PCI-X 
to SATA controller 310, the data and control signals are 
transmitted to a PSD 420. In an alternative embodiment, a 
PCI to SATA controller can be used in place of the PCI-X to 
SATA controller 310. In the PCI to SATA controller, a PCI 
interface 312 (not shown) is used in place of the PCI-X in- 
terface 312. Those skilled in the art will know such re- 
placement can be easily accomplished without any diffi- 
culty. In an alternative embodiment, a PCI Express to SATA 
controller can be used in place of the PCI-X to SATA con- 
troller 310. In the PCI Express to SATA controller, a PCI 
Express interface is used in place of the PCI-X interface 
312. Those skilled in the art will know such replacement 
can be easily accomplished without any difficulty. 
[0066] N ex t please refer to Fig. 9, Fig. 9 is a block diagram illus- 
trating an embodiment of the SATA port 600 of Fig. 8. As 
shown in Fig. 9, the SATA port 600 comprises a superset 
register 630, a command block register 640, a control 
block register 650, and a DMA register 620. By filling 
these registers, data will be transferred between the Dec/ 



Mux arbiter 314 and a transport layer 690 through a dual 
port FIFO 660 under the control of a DMA controller 670. 
The information received by the transport layer 690 will 
be re-formatted into a frame information structure (FIS) 
primitive and transmitted to a Link layer 700. 

[0067] The Link layer 700 is then to re-format the FIS into a 
frame by adding SOF, CRC, EOF, etc., thereto and per- 
forming the 8b/10b encoding into encoded 8b/10b char- 
acters and transmits it to a PHY layer 710. 

[0068] The PHY layer 710 will transmit signals through a pair of 
differential signal lines, transmission lines LTX+, LTX-, to 
and receive signals through another pair of differential 
signal lines, reception lines LRX+, LRX-, from a PSD con- 
troller in the PSD 420. The two signal lines of each pair of 
the signal lines, for example LTX+/LTX-, transmit signals 
TX+/TX- simultaneously at inverse voltage, for example, 
+V/-V or V/+V, with respective to a reference voltage 
Vref so that the voltage difference will be +2v or 2V and 
thus enhance signal quality. This is also applicable to the 
transmission of the reception signals RX+/RX- on recep- 
tion lines LRX+, LRX-. 

[0069] when receiving a frame from the PHY layer 710, the Link 
layer 700 will decode the encoded 8b/ 10b characters and 



remove the SOF, CRC, EOF. A CRC will be calculated over 
the FIS to compare with the received CRC to ensure the 
correctness of the received information. When receiving 
the FIS from the Link layer 700, the transport layer 690 
will determine the FIS type and distribute the FIS content 
to the locations indicated by the FIS type. 

[0070] a transmission structure complying with serial ATA proto- 
col is shown in Fig. 10. The information communicated on 
the serial line is a sequence of 8b/10b encoded charac- 
ters. The smallest unit thereof is a double-word (32 bits). 
The contents of each double-word are grouped to provide 
low-level control information or to transfer information 
between a host and an device connected thereto. Two 
types of data structures transmitted on signal lines are 
primitives and frames. 

[0071] a primitive consists of a single double-word and is the 
simplest unit of information that may be communicated 
between a host and a device. When the bytes in a primitive 
are encoded, the resulting pattern is not easy to be misin- 
terpreted as another primitive or a random pattern. Primi- 
tives are used primarily to convey real-time state infor- 
mation, to control the transfer of information and to coor- 
dinate communication between the host and the device. 



The first byte of a primitive is a special character. 

[0072] a frame consists of a plurality of double-words, and starts 
with an SOF (Start Of Frame) primitive and ends with an 
EOF (End Of Frame) primitive. The SOF is followed by a 
user payload called a FIS (Frame Information Structure). A 
CRC (Cyclic-Redundancy Check Code) is the last non- 
primitive double-word immediately proceeding the EOF 
primitive. The CRC is calculated over the contents of the 
FIS. Some other flow control primitives (HOLD or HOLDA) 
are allowed between the SOF and EOF to adjust data flow 
for the purpose of speed matching. 

[0073] The transport layer constructs FISs for transmission and 
decomposes FISs received from the link layer. The trans- 
port layer does not maintain context of ATA commands or 
previous FIS content. As requested, the transport layer 
constructs an FIS by gathering FIS content and placing 
them in proper order. There are various types of FIS, two 
of which are shown in Fig. 11 and Fig. 12. 

[0074] as shown in Fig. 11, a DMA setup FIS contains a HEADER in 
field 0. The first byte (byte 0) thereof defines the FIS type 
(41h), and the FIS type defines the rest fields of this FIS 
and defines the total length of this FIS as seven double- 
words. Bit D in byte 1 indicates the direction of the subse- 



quent data transfer. D=l means transmitter to receiver; 
D=0 means receiver to transmitter. Bit I in byte 1 is an in- 
terrupt bit. Bit R in byte 1 is a reserved bit and set to 0. 
DMA buffer identifier low/high field (field 1) indicates the 
DMA buffer region in the host memory. DMA buffer offset 
field (field 4) is the byte offset into the buffer. DMA trans- 
fer count field (field 5) is the number of bytes that will be 
read or written by the device. 

[0075] As shown in Fig. 12, a DATA FIS contains a HEADER in field 
0. The first byte (byte 0) thereof defines the FIS type (46h), 
and the FIS type defines the rest fields of this FIS and de- 
fines the total length of this FIS as n+1 double-words. 
The R bits in byte 1 are reserved bits and set to 0. The 
fields 1 through n are double-words of data, which con- 
tain the data to transfer. The maximum amount of a sin- 
gle DATA FIS is limited. 

[0076] | n the embodiment of Fig. 4, the host-side 10 device inter- 
connect controller 220 and the device-side 10 device in- 
terconnect controller 300 (SATA 10 device interconnect 
controller 300) can be implemented with the same kind of 
IC chip, with 10 device interconnect ports in the host-side 
10 device interconnect controller 220 configured as host- 
side 10 device interconnect ports and with 10 device inter- 



connect ports in the device-side 10 device interconnect 
controller 300 configured as device-side 10 device inter- 
connect ports. Alternately, a single chip could be config- 
ured to contain both host-side 10 device interconnect 
ports and device-side 10 device interconnect ports for, re- 
spectively, coupling to the host entity 10 and the PSD ar- 
ray 400 concurrently. 
[0077] N ex t examples of the 10 flow between the host entity 10 
and the SVC 200 and the 10 flow between the SVC 200 
and the PSD array 400 will be introduced. Please refer to 
Fig. 13 and Fig. 14 where an example of the 10 flow be- 
tween the host entity 10 and the SVC 200 is illustrated. 10 
requests are received from the host entitylO over one of 
the host-side 10 device interconnects. The 10 requests are 
parsed to determine what operation is to be performed 
and, for asynchronous device and synchronous device op- 
erations, on which section of the logical media unit the 
operation is to be performed. If the operation consists 
only of internally-emulated and asynchronous device sub- 
operations, then the SVC 200 executes the associated 
sub-operations including transferring any associated data 
to/from the host entity 10 and then responding to the 
host entity 10 with a status report that indicates whether 



the operation succeeded or failed and for what reason(s). 
If the operation includes synchronous device operations, 
then appropriate device-side 10 requests are generated 
and issued to the appropriate PSDs 420 in response. The 
contents of each individual 10 request and the distribution 
of the 10 requests to different PSDs 420 is determined by 
the nature of the particular mapping that is associated 
with the particular LMU. Prior to or concurrently with the 
execution of device-side 10 requests, any "payload" data 
that is to be obtained from the host entity 10 as part of 
the host-side 10 request execution and then transferred 
to the PSD 420 is transferred from the host entity 10 to 
the SVC 200. 

[0078] on successful completion of the device-side 10 requests, 
data read in response to the device-side 10 request is de- 
livered to the entity requesting it, which, in a caching im- 
plementation, may be the cache and any data requested 
by the host is delivered to the host entity 10. A status re- 
port is then generated and transmitted to the host entity 
lOindicating the success of the operation. If there were 
device-side 10 requests that did not complete success- 
fully, the SVC 200 may engage backup operations that al- 
low successful completion of the sub-operation even in 



the face of individual device-side 10 request failures. Such 
operations will typically include generating other device- 
side 10 requests to different media sections to recover the 
data in the case of reads or to write backup data in the 
case of writes. RAID 5 is an example of where data on 
other PSDs 420 can be used to regenerate data that could 
not be read off from a particular PSD 420. Alternately, the 
SVC 200 could elect to fail the sub-operation aborting the 
delivery of data to the host entity lOand returning a cor- 
responding status report to the host entity 10. 
[0079] An example of the 10 flow between the SVC 200 and a 
PSD420 is shown in Fig. 15 and Fig. 16. For each device- 
side 10 request generated as part of a synchronous device 
sub-operation, the 10 request information defining the 
parameters of the particular 10 operation (e.g., destination 
media section base address, media section length, com- 
mand indicating operation to be performed, etc.) is for- 
matted into a Frame Information Structure (FIS) of the 
Register Host-to-Device type, packaged into a Serial ATA 
frame and then transmitted to the associated PSD 420 
over the serial ATA interconnect to which the specified 
PSD 420 is connected. Each frame comprises a Cyclic- 
Redundancy Check (CRC) value that is computed from the 



data in the frame so that if the data in the frame gets 
modified in any way en-route to the destination PSD 420, 
a consistency check of the data performed by the PSD 420 
on receipt of the frame against the CRC of the frame will 
fail and the PSD 420 will respond with an R_ERR primitive 
sent to the SVC 200 following the receipt of the frame in- 
dicating that the frame was not received intact. The SVC 
200 may then, at its option, re-send the frame or it may 
abort the transaction and return a corresponding status 
report to the requesting entity. 
[0080] |f the frame is received intact, the PSD 420 responds with 
an R_OK primitive sent to the SVC 200 following the re- 
ceipt of the frame informing the SVC 200 that the frame 
was received intact. The PSD 420 parses the request con- 
tained in the frame to determine the nature of the opera- 
tion to be performed and the section of media on which it 
is to be performed. If the operation is not a valid opera- 
tion on the specified section of media or the section of 
media is not a valid one, the PSD 420 responds to the SVC 
200 with a corresponding status report, which is done by 
generating an S-ATA Register-Device-to-Host FIS con- 
taining the status information, packaging it into a Serial 
ATA frame and transmitting it back to the SVC 200. Oth- 



erwise, the requested operation is performed. 
[0081] p r j or to or during execution of the requested operation, if 
the transfer of payload data from the SVC 200 to the PSD 
420 is required, the PSD 420 will generate and issue an S- 
ATA frame conveying a DMA-Activate-Device-to-Host FIS 
requesting the first set of data be sent. The SVC 200 will 
then split the data into sections whose lengths do not ex- 
ceed the maximum length that is allowed to be transmit- 
ted in a single frame by the S-ATA protocol. Each section 
is packaged into a frame of the type Data-Host-to-Device 
FIS and then shipped to the PSD 420 one at a time. Fol- 
lowing the transmission of each frame, the SVC 200 waits 
for receipt of a frame conveying a DMA-Acti- 
vate-Device-to-Host FIS from the PSD 420 indicating that 
the PSD 420 is ready to receive more data prior to trans- 
mitting the next frame of data. Each frame of data con- 
tains a CRC value generated from the data, which is 
checked by the PSD 420 for consistency with the data on 
receipt of each frame. If an inconsistency is discovered 
between the data in the frame and the CRC of the frame, 
the IO operation is aborted and the PSD 420 generates a 
corresponding status report, which is done by generating 
an S-ATA Register-Device-to-Host FIS comprising the 



status information, packaging it into a Serial ATA frame 
and transmitting it back to the SVC 200. On receipt of this 
status, the SVC 200, may, at its option, re-issue the initial 
10 request to retry the operation or it may abort the 
transaction and return a corresponding status report to 
the requesting entity. 
[0082] During execution of the requested operation and/or after 
operation is complete, if the transfer of payload data from 
the PSD 420 to the SVC 200 is required, the PSD 420 pre- 
pares the data (which may require reading the data from 
the storage medium) and splits it into sections whose 
lengths do not exceed the maximum length that is al- 
lowed to be transmitted in a single frame by the S-ATA 
protocol. Each section is packaged into a frame of the 
type Data-Device-to-Host FIS and then transmitted to the 
SVC 200 one frame at a time. Once again, a CRC value is 
generated from the data in the frame and sent to the SVC 
200 in the frame and is checked by the SVC 200 for con- 
sistency with the data on receipt of each frame. If an in- 
consistency is discovered between the data in the frame 
and the CRC of the frame, the SVC 200 will respond with 
an R_ERR primitive sent to the PSD 420 following the re- 
ceipt of the frame indicating that the frame was not re- 



ceived intact. The PSD 420 will typically then abort the 10 
operation immediately generating a corresponding status 
report by generating an S-ATA Register-Device-to-Host 
FIS containing the status information, packaging it into a 
Serial ATA frame and transmitting it back to the SVC 200. 
On receipt of this status, the SVC 200, may, at its option, 
re-issue the initial 10 request to retry the operation or it 
may abort the transaction and return a corresponding sta- 
tus report to the requesting entity. 
[0083] |f a || d a ta is received intact by the SVC 200, it will respond 
with R_OK primitive to each of the Data-Device-to-Host 
FIS frames. When all data that needs to be delivered to the 
SVC 200 as part of the 10 request execution is transferred 
to the SVC 200, the PSD 420 will generate a status report 
indicating whether the operation completed successfully 
and, if not, for what reason. This status report is format- 
ted into an S-ATA Register-Device-to-Host FIS, packaged 
into a S-ATA frame and shipped off back to the SVC 200. 
The SVC 200 then parses this status report to determine 
the success or failure of the request. If request failure was 
reported, the SVC 200, may, at its option, re-issue the 
initial 10 request to retry the operation or it may abort the 
transaction and return a corresponding status report to 



the requesting entity. 

[0084] | n a legacy Parallel ATA SVC, the overall flow is similar to 
the above, however, the initial device-side 10 request in- 
formation that defines the parameters of the 10 operation 
(e.g., destination media section base address, media sec- 
tion length, etc.) is not packaged into a frame that is 
checked for validity of the data conveyed as it is in S-ATA 
with the frame CRC. Therefore, such data could be inad- 
vertently corrupted (e.g. due to noise) during the transfer 
from SVC to PSD may go undetected, potentially leading to 
a catastrophic data corruption situation in which data gets 
written to the wrong section of media possibly due to cor- 
ruption of the destination media section base address 
and/or media section length that are contained in the ini- 
tial 10 request data. In the S-ATA implementation, such a 
corruption will be detected because the frame CRC will be 
inconsistent with the data and the PSD will then abort the 
command rather than writing the data to or reading the 
data from the wrong section of the media. This is one of 
the primary benefits of an S-ATA implementation of an 
SVC over a P-ATA. 

[0085] | n actual operation, the SVC 200 will typically concurrently 
process and execute multiple operations in response to 



multiple queued host-side 10 requests. These requests 
may be from a single host or multiple hosts. These opera- 
tions may consist of synchronous device sub-operations 
that execute concurrently. Each such sub-operation may 
generate multiple device-side 10 device requests address- 
ing different PSDs 420. Each such 10 device request may 
involve transferring significant amounts of data between 
the SVC 200 and the addressed PSD 420 over the device- 
side 10 device interconnect that connects the two. Typi- 
cally, the SVC 200 will be configured such that 10 device 
requests get distributed amongst the different PSDs 420 
and the different device-side 10 device interconnects so 
as to maximize the collective bandwidth of all the PSDs 
420 and 10 device interconnects. An example of configu- 
rationally improving collective bandwidth is combining 
PSDs 420 into logical media units using a RAID 5 ap- 
proach rather than RAID 4 combination. In RAID 4, a dedi- 
cated parity drive is assigned to which all parity updates 
are directed. For writes, in which every write requires a 
parity update, the parity drive may end up being far busier 
than data drives. For reads, the parity drive is not ac- 
cessed, meaning that there is one drive that is not con- 
tributing to the task of delivering data. RAID 5 distributes 



the parity equally between all drives so that, assuming 10 
requests address physical media in an evenly distributed 
way, no one drive is busier than any other on writes and 
all drives contribute to the task of delivering data on 
reads. 

[0086] | n addition, the SVC 200 may be equipped with the means 
and intelligence to dynamically adjust the distribution of 
10 device requests among the various PSDs 420 and/or 10 
device interconnects to further optimize collective PSD/ 
Interconnect bandwidth. An example of this is load- 
balancing between 10 interconnects that connect to the 
same set of PSDs 420. The SVC 200 may intelligently keep 
track of the 10 device requests that were delivered over 
each interconnect and from this information determine 
over which interconnect the next 10 device request should 
be sent in order to maximize the collective bandwidth of 
the interconnect. Another example is load-balancing data 
read 10 requests between a mirrored set of PSDs 420. 
Once again, the SVC 200 may intelligently keep track of 
the 10 device requests that were addressed to each PSD 
420 to determine to which the next 10 device request 
should be sent in order to maximize the collective band- 
width of the mirrored set of PSDs 420. 



[0087] with the ability to maximize collective bandwidth of de- 
vice-side 10 device interconnects, overall performance of 
a storage virtualization subsystem of which the SVC plays 
a key role may, under certain types of host 10 request 
loads, be limited by this collective bandwidth. Under such 
conditions, increasing the collective bandwidth may result 
in significantly better performance. Overall device-side 10 
device interconnect performance is determined by two 
factors: the IO-request-execution/data-transfer rate of 
the interconnect and the number of interconnects over 
which IO-requests/data-transfers are distributed. The 
higher the IO-request-execution/data-transfer rate of the 
interconnect, the greater the overall performance. Simi- 
larly, the more interconnects there are over which device- 
side 10 requests and data transfers can be distributed, the 
greater the overall performance of the device-side 10 in- 
terconnect subsystem. 

[0088] As mentioned before, Parallel ATA interconnects do not 
scale well beyond a certain point because of the number 
of dedicated signal lines (28) that must be used to form 
each distinct interconnect. A typical P-ATA SVC may in- 
corporate no more than 12 distinct device-side P-ATA 10 
device interconnects because of the number of signal lines 



per interconnect. Parallel SCSI not only suffers from the 
same drawback with 68 signal lines per interconnect, it is 
also significantly more expensive and occupies a signifi- 
cantly larger footprint on a printed circuit board (PCB) per 
interconnect than either P-ATA or S-ATA. A typical SVC 
might implement 4 to 8 distinct device-side Parallel SCSI 
10 device interconnects, each interconnect costing several 
times that of a P-ATA/S-ATA interconnect. Fibre intercon- 
nects do not scale well because of the size of the footprint 
on the PCB and cost per interconnect, typically being an 
order of magnitude greater than P-ATA/S-ATA. 
[0089] Serial ATA 10 device interconnects scale well because each 
interconnect only consists of 4 signal lines and they allow 
for a high level of integration such that a single S-ATA 
controller IC may support 8 interconnects versus 2 for the 
standard Parallel SCSI and Fibre of equivalent pin count 
and size. Further, S-ATA per-interconnect cost is low 
enough to permit large numbers of device-side S-ATA 10 
device interconnects to be included on a single cost- 
effective SVC. 

[0090] one limitation of a "pure" Serial ATA SVC in which all of 
the device-side 10 device interconnects are Serial ATA is 
that the number of PSDs that can be connected is limited 



by the number of device-side 10 device interconnects that 
can be packed onto a single SVC, and, because the S-ATA 
specification only allows for maximum signal line lengths 
of 1.5m, the PSDs connected to one SVC must be packed 
close enough so that no signal line length exceeds 1.5m. 
A typical S-ATA storage virtualization subsystem will only 
provide for connection of a maximum of 16 S-ATA PSDs 
because of these limitations. So a "pure" S-ATA storage 
virtualization subsystem is unable to match the expand- 
ability of a Fibre FC-AL storage virtualization subsystem, 
which will typically allow for connection of up to 250 PSDs 
via connection of external expansion chassis on the same 
set of device-side 10 device interconnects. 
[0091] | n order to overcome this limitation, the present invention 
optionally includes one or more expansion device-side 
multiple-device IO device interconnects, herein referred to 
as device-side expansion ports, such as Parallel SCSI or 
Fibre FC-AL on the SVC. These interconnects will typically 
be wired in such a way as to allow external connection of 
external expansion chassis. These chassis could be simple 
"native" just a bunch of drives (JBODs) of PSDs directly 
connected to the interconnect without any intervening 
conversion circuitry or could be intelligent JBOD emulation 



subsystems that emulate "native" J BODs using a combina- 
tion of S-ATA or P-ATA PSDs and a single or redundant 
set of SVCs that provide the conversion from the multiple- 
device 10 device interconnect protocol that provides the 
connection of theJBOD subsystem to the primary storage 
virtualization subsystem to the device-side 10 device in- 
terconnect (S-ATA or P-ATA) protocol that provides the 
connection between theJBOD SVC(s) and the PSDs that 
they manage. 

[0092] please refer to Fig. 17, which is an embodiment diagram of 
a storage virtualization subsystem supporting device-side 
expansion ports. In Fig. 17, each storage unit to which the 
expansion ports are connected are single-ported. How- 
ever, if the storage units to which the expansion ports are 
connected are dual-ported, then a SVC equipped with one 
or more pairs of redundantly-configured expansion ports 
could have one of the ports in a redundant pair connected 
to one of the ports in the dual-ported pair in a storage 
unit and the other port in the redundant pair connected to 
the other port in the storage unit"s dual-ported pair. 
Fig. 18 depicts such a configuration. In this case, if one of 
the SVC ports in the redundant pair malfunctions, or one 
of the storage unit ports in the dual-port pair malfunc- 



tions, or one of the 10 device interconnects connecting 
one of the SVC ports in the redundant pair to one of the 
storage unit ports in the dual-port pair breaks or be- 
comes blocked, access to the storage unit by the SVC can 
still be accomplished via the alternate path consisting of 
the interconnect connecting the alternate SVC port to the 
alternate storage unit port. 
[0093] There is a variation on the Serial ATA storage visualiza- 
tion subsystem that uses Parallel ATA PSDs rather than 
Serial ATA PSDs. It incorporates an SATA-to-PATA conver- 
sion circuit that resides in close proximity to the P-ATA 
PSD that converts S-ATA signals and protocol to P-ATA 
and back again in the opposite direction. While there is a 
short length of P-ATA signal traces between the SATA- 
to-PATA conversion circuit and the PSD which will poten- 
tially be vulnerable to undetected corruption of informa- 
tion passed between the SVC and the PSD, the backplane 
signal traces that are especially vulnerable to noise and 
cross-talk because of their length and their number (as 
mentioned before, P-ATA requires 28 signal traces per in- 
terconnect) will be protected from undetected data cor- 
ruption by virtue of S-ATA improved error detection capa- 
bilities. Other than this, virtually all of the benefits of S- 



ATA storage virtualization subsystem over ones incorpo- 
rating other standard device-side 10 device interconnects 
are still achieved in this variation. 

[0094] The importance of a Serial ATA storage virtualization sub- 
system that uses Parallel ATA PSDs lies in the fact that, in 
the short term, supplies of Serial ATA drives will still be 
relatively short compared to Parallel ATA, and Serial ATA 
drives will still be significantly more expensive. During 
this transitional period, this kind of a subsystem will allow 
P-ATA PSDs to be substituted for S-ATA PSDs, eliminating 
the concerns over S-ATA PSD supply and cost. Such a 
subsystem will typically place the conversion circuit in the 
removable canister in which the PSD resides. The remov- 
able canister allows the PSD and any related circuitry to be 
easily swapped out in the event of a PSD and/or related 
circuitry needing servicing. By placing the conversion cir- 
cuit in the canister, when S-ATA drives become readily 
available at a competitive price, the entire canister con- 
tents could be exchanged for an S-ATA PSD and S-ATA 
related circuitry. 

[0095] please refer to Fig. 19 and Fig. 20. Fig. 19 depicts a block 
diagram of a removable P-ATA-PSD canister while Fig. 20 
depicts a block diagram of a removable S-ATA-PSD canis- 



ter. Both canisters of Fig. 19 and Fig. 20 have S-ATA 10 de- 
vice interconnect coming in from the SVC. The primary 
difference is in the presence of an SATA-to-PATA conver- 
sion circuit in the removable P-ATA-PSD canister which is 
absent from the removable S-ATA-PSD canister. 
[0096] Another feature that an SVC might typically implement is 
redundancy in the host-side interconnects in which multi- 
ple host-side interconnect ports are included on the SVC 
and LMUs are presented to the host identically over two or 
more of these interconnects. This feature is designed to 
allow the host the ability to maintain access to the LMU 
even if one of the interconnects and/or ports on the inter- 
connect should break, become blocked or otherwise mal- 
function. 

[0097] | n t his implementation, the two separate SVC host-side 

ports connect to two entirely separate host-side 10 device 
interconnects and host ports (not shown). In an imple- 
mentation supporting redundancy in the host-side inter- 
connects, the SVC would present the same set of logical 
media units in an identical fashion on both ports. 

[0098] storage virtualization subsystems typically also include 
functionality that allows devices in the subsystems, such 
as power supplies, fans, temperature monitors, etc, to be 



managed and monitored by the SVC(s). As mentioned be- 
fore, this functionality is commonly referred to as enclo- 
sure management services (EMS). Often times, EMS is im- 
plemented using intelligent circuitry, that is circuitry that 
includes a CPU and runs a software program to achieve 
the desired functionality. Traditional Parallel SCSI and Fi- 
bre SV subsystems have typically relied on the standard 
SCSI protocols SAF-TE and SES, respectively, as the pri- 
mary communication mechanism that the SVC uses to 
communicate with the SVS"s EMS. These protocols, in 
turn, rely on a connection between the SVC(s) and the SVS 
consisting of an 10 device interconnect that provides 
transport of SCSI command protocol, such as Parallel SCSI 
or Fibre interconnects. However, in the typical S-ATA SVS, 
there is no such connection between the SVC(s) and the 
"local" SVS (note that the expansion ports do provide such 
a connection to "remote" JBOD subsystems, but not to the 
"local" SVS). Such a connection could be implemented, but 
it would increase the cost of the SVS significantly. A more 
cost-effective solution would be to use a low-cost inter- 
connect and communicate over this interconnect using 
proprietary protocols. 
[0099] 12c is a low-cost interconnect that supports two-way 



transfer of data at an acceptable rate of transfer. It is 
commonly used in PCs to allow the CPU to manage and 
monitor motherboard and other device status. It is well 
suited to the task of providing a communication medium 
between the SVC(s) and the EMS in the local SVS, espe- 
cially in S-ATA SVS" that do not already have an intercon- 
nect connecting the SVC(s) to the SVS. It does not, by 
standard, support transport of SCSI command protocol, 
however, so any implementation using it as the primary 
communication medium between the SVCs and the EMS in 
the local SVS will communicate using an alternate proto- 
col, typically a proprietary one. 
[0100] with I2C as the primary communication medium, EMS 

could be implemented in two ways. The first is using in- 
telligent circuitry that communicates with the SVC using 
an intelligent protocol similar to SAF-TE/SES. The second 
is to integrate "dumb" off-the-shelf I2C latches and/or 
status monitoring ICs into a management/monitoring cir- 
cuit and leave all the intelligence to the SVC. The former 
option has the advantage of allowing the EMS to provide 
more advanced services, value, and customizations. How- 
ever, it is typically complicated and expensive to imple- 
ment. The latter option is easy and inexpensive to imple- 



ment but typically cannot support advanced functionality. 
[0101] psd subsystems in storage virtualization subsystems are 
designed to emulate typically implement enclosure man- 
agement services that can be directly managed and moni- 
tored by a host over the IO device interconnects that also 
serve as the primary access interconnects for the PSDs in 
the subsystem. In these implementations, the EMS cir- 
cuitry is intelligent and implements standard SCSI proto- 
cols for managing and monitoring EMS, such as SAF-TE 
and SES, that can be transported over the primary access 
interconnects. In these implementations, EMS controllers 
will either connect directly to one or more of the primary 
access IO device interconnects to allow communication 
with the host directly, a configuration herein referred to as 
"direct-connect", or rely on pass-through mechanisms 
supported by those devices that are directly connected to 
the primary access interconnects (typically, PSDs) to for- 
ward requests and associated data from the host to the 
EMS controller and responses and associated data from 
the EMS controller to the host, herein referred to as "de- 
vice-forwarded". Direct connect EMS implementations 
provide independence from PSDs such that a failure or 
absence of one or even all PSDs would not affect the op- 



eration or accessibility of the EMS. The drawback of direct 
connect EMS implementations is that they are typically 
more expensive and complicated to implement. The ad- 
vantage of device-forwarded EMS implementation is in the 
ease of implementation and relative cost effectiveness, 
but suffers from the weakness that failing or absent PSDs 
could result in the loss of access to the EMS by the host(s). 
[0102] | n order to enhance compatibility with hosts that are de- 
signed to interface with actual PSD subsystems, a SVS that 
is equipped with EMS might support one or more standard 
SCSI EMS management protocols and one or both of the 
connection configurations described above, direct-con- 
nect and device-forwarded. For direct-connect emula- 
tions, the SVC will present the EMS services on a host-side 
IO device interconnect as a one or more ID/LUNs(logical 
unit number). The EMS may have dedicated interconnect 
IDs assigned to it or it may simply have assigned to it 
LUNs on IDs that already present on other LUNs. For SAF- 
TE emulations, the SVC must present EMS SAF-TE de- 
vice(s) on dedicated IDs. For direct-connect SES emula- 
tions, the EMS SES device(s) could be presented on dedi- 
cated IDs or on IDs presenting other LUNs. For device- 
forwarded emulations, the SVC will simply include infor- 



mation in the INQUIRY string of the virtual PSDs responsi- 
ble for forwarding the EMS management requests that in- 
dicates to the host that one of the functions of the said 
PSD is to forward such requests to the EMS. Typically, 
multiple virtual PSDs, and maybe even all of the virtual 
PSDs presented on the interconnect will be presented as 
forwarders of EMS management requests so that the ab- 
sence or failure of one or more virtual PSDs will not result 
in loss of access to the EMS. 
[0103] Those skilled in the art will readily observe that numerous 
modifications and alternations of the device may be made 
while retaining the teaching of the invention. Accordingly, 
the above disclosure should be construed as limited only 
by the metes and bounds of the appended claims. 



